Method to their March Madness: Insights from Mining a Novel Large-Scale Dataset of Pool Brackets

نویسندگان

  • Mason Wright
  • Jenna Wiens
چکیده

Each March, the NCAA Men’s Basketball Tournament attracts tens of millions of viewers, a billion dollars in revenue, and millions of pool brackets from fans attempting to predict the tournament outcome. Previous studies have examined March Madness pools, but no prior work gives an in-depth exploratory analysis of a large-scale bracket dataset. We present a novel dataset of over 200,000 brackets collected from an online pool platform, evenly split between the 2015 and 2016 tournaments. An exploratory analysis of the data reveals insights about the strategies of online pool entrants, which range from rational to absurd. The pool bracket distribution is shown to have surprising quirks that are stable from year to year, such as a large number of all-upsets brackets, and a tendency to over-back the top-ranked team. This exploratory work is a first step toward a generative model that can be used to predict the empirical bracket distribution, given tournament features such as estimated pairwise win probabilities.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

شناسایی نوع و مدل وسیله نقلیه با استفاده از مجموعه بخش‌های متمایز‌کننده

In fine-grained recognition, the main category of object is well known and the goal is to determine the subcategory or fine-grained category. Vehicle make and model recognition (VMMR) is a fine-grained classification problem. It includes several challenges like the large number of classes, substantial inner-class and small inter-class distance. VMMR can be utilized when license plate numbers ca...

متن کامل

Automatic Discovery of Technology Networks for Industrial-Scale R&D IT Projects via Data Mining

Industrial-Scale R&D IT Projects depend on many sub-technologies which need to be understood and have their risks analysed before the project can begin for their success. When planning such an industrial-scale project, the list of technologies and the associations of these technologies with each other is often complex and form a network. Discovery of this network of technologies is time consumi...

متن کامل

Making Sense of the Mayhem: Machine Learning and March Madness

The goal of our research was to be able to accurately predict the outcome of every matchup in a March Madness bracket. This is an extraordinarily difficult problem because of the high amount of variance in college basketball and the sheer number of games that are played in the tournament. In the history of March Madness, no one has ever created a perfect bracket. Last year, Warren Buffett agree...

متن کامل

Evaluation of Updating Methods in Building Blocks Dataset

With the increasing use of spatial data in daily life, the production of this data from diverse information sources with different precision and scales has grown widely. Generating new data requires a great deal of time and money. Therefore, one solution is to reduce costs is to update the old data at different scales using new data (produced on a similar scale). One approach to updating data i...

متن کامل

High performance of the support vector machine in classifying hyperspectral data using a limited dataset

To prospect mineral deposits at regional scale, recognition and classification of hydrothermal alteration zones using remote sensing data is a popular strategy. Due to the large number of spectral bands, classification of the hyperspectral data may be negatively affected by the Hughes phenomenon. A practical way to handle the Hughes problem is preparing a lot of training samples until the size ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016